Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add local dynamic filter support in IcebergPageSourceProvder #5719

Closed

Conversation

knwg145
Copy link

@knwg145 knwg145 commented Oct 28, 2020

Uses the dynamic filter in IcebergPageSourceProvider to filter results

@cla-bot cla-bot bot added the cla-signed label Oct 28, 2020
@sopel39 sopel39 requested a review from raunaqmorarka October 28, 2020 13:12
@raunaqmorarka
Copy link
Member

@knwg145 could you please push the code changes for the comments you've marked as resolved ?

@knwg145 knwg145 force-pushed the iceberg-local-dynamic-filtering branch from fb6f893 to e03c88a Compare November 2, 2020 09:51
@knwg145 knwg145 force-pushed the iceberg-local-dynamic-filtering branch from e03c88a to 5440fb3 Compare November 2, 2020 09:58
@@ -29,6 +29,7 @@
import io.prestosql.execution.warnings.WarningCollector;
import io.prestosql.metadata.Metadata;
import io.prestosql.operator.OperatorStats;
import io.prestosql.server.DynamicFilterService;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

DistributedQueryRunner runner = (DistributedQueryRunner) getQueryRunner();
ResultWithQueryId<MaterializedResult> result = runner.executeWithQueryId(
withBroadcastJoin(),
"SELECT * FROM lineitem JOIN orders ON lineitem.orderkey = orders.orderkey");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use lineitem.suppkey = supplier.suppkey instead.
In this case the dynamic filter is set to all because the build side is too big, whereas we want to test the case where there is a dynamic filter but it doesn't filter anything.

@@ -1389,4 +1395,55 @@ private void dropTable(String table)
assertUpdate(session, "DROP TABLE " + table);
assertFalse(getQueryRunner().tableExists(session, table));
}

@Test
public void testLocalDynamicFilterWithEmptyBuildSide()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests rely on fact that probe side will not progress until build side is not ready. Because of that these tests are fragile and might break when we change execution (which we will).
In order to write these tests reliable, Iceberg should support either waiting for local DF or waiting for DF for split generation. Then (in tests) we can enforce presence of DF before table scan happens

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we tackle introduce waiting for split generation like Hive separately and let this one go through with less restrictive assertions ?
Interestingly, TestHiveDistributedJoinQueries#testJoinWithEmptyBuildSide is not flaky despite not using lazy DF to delay probe side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would prevent optimization like #3957
I'm fine having local DF without testing for now (it's one liner), but please create an issue in prestosql (for blocking on DFs locally) and mention it in code.

Alternatively, we can simply add that blocking because it's pretty straight forward (just pass DF future to IcebergPageSource and block there)

@@ -167,7 +168,7 @@ public ConnectorPageSource createPageSource(
split.getLength(),
split.getFileFormat(),
regularColumns,
table.getPredicate());
table.getPredicate().intersect(dynamicFilter.getCurrentPredicate().transform(IcebergColumnHandle.class::cast).simplify(100)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you passed dynamicFilter to IcebergPageSource you could easily block on DF in io.prestosql.spi.connector.ConnectorPageSource#isBlocked. However, this should be behind feature toggle

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems from the code that stripe pruning happens before IcebergPageSource is created in createDataPageSource -> createOrcPageSource -> reader.createRecordReader -> OrcRecordReader. So we would miss that even if we block in IcebergPageSource.
I think row group pruning could still be accomplished by blocking on DF in IcebergPageSource if dynamicFilter is pushed into StripeReader as well. Not sure if the changes are worth doing though.
Please correct if I'm missing something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably create createDataPageSource in IcebergPageSource in a lazy way (when DF is ready). This way we don't allocate resources until DF is ready (I'm not sure it's big of an issue though)

@@ -1389,4 +1395,55 @@ private void dropTable(String table)
assertUpdate(session, "DROP TABLE " + table);
assertFalse(getQueryRunner().tableExists(session, table));
}

@Test
public void testLocalDynamicFilterWithEmptyBuildSide()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would prevent optimization like #3957
I'm fine having local DF without testing for now (it's one liner), but please create an issue in prestosql (for blocking on DFs locally) and mention it in code.

Alternatively, we can simply add that blocking because it's pretty straight forward (just pass DF future to IcebergPageSource and block there)

@raunaqmorarka
Copy link
Member

Superseded by #9538

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants